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SUMMARY 


The generative power of human language depends 
on grammatical rules, such as word ordering, that 
allow us to produce and comprehend even novel 
combinations of words [1-3]. Several species of 
birds and mammals produce sequences of calls 
[4-6], and, like words in human sentences, their order 
may influence receiver responses [7]. However, it 
is unknown whether animals use call ordering to 
extract meaning from truly novel sequences. Here, 
we use a novel experimental approach to test this 
in a wild bird species, the Japanese tit (Parus minor). 
Japanese tits are attracted to mobbing a predator 
when they hear conspecific alert and recruitment 
calls ordered as alert-recruitment sequences [7]. 
They also approach in response to recruitment calls 
of heterospecific individuals in mixed-species flocks 
[8, 9]. Using experimental playbacks, we assess their 
responses to artificial sequences in which their own 
alert calls are combined into different orderings 
with heterospecific recruitment calls. We find that 
Japanese tits respond similarly to mixed-species 
alert-recruitment call sequences and to their own 
alert-recruitment sequences. Importantly, however, 
tits rarely respond to mixed-species sequences 
in which the call order is reversed. Thus, Japanese 
tits extract a compound meaning from novel call 
sequences using an ordering rule. These results 
demonstrate a new parallel between animal commu- 
nication systems and human language, opening new 
avenues for exploring the evolution of ordering rules 
and compositionality in animal vocal sequences. 


RESULTS AND DISCUSSION 


Japanese tits produce combinations of calls in particular con- 
texts, which evoke specific responses in receivers [7]. They pro- 
duce alert (ABC) calls when detecting a variety of predatory 


® CrossMark 


threats, they produce recruitment (D) calls when attracting flock 
members in non-dangerous situations, and they combine these 
two calls into alert-recruitment (ABC-D) sequences when recruit- 
ing other flock members to mob a stationary predator (Figure 1A; 
[7, 10]). Previous playback experiments showed that receivers 
respond to alert (ABC) calls by scanning the surroundings for 
potential danger, recruitment (D) calls by approaching the sound 
source, and alert-recruitment (ABC-D) sequences by mixing 
these two responses, i.e., scanning for danger while simulta- 
neously approaching the sound source [7]. However, when the 
call order is artificially reversed (D-ABC), tits rarely exhibit either 
alert or approaching responses [7]. 

These findings suggest that tits use call ordering (i.e., alert- 
recruitment ordering rule) to extract a compound meaning from 
call sequences, which provides one of the first examples of 
compositional expressions in non-human animals [3, 11, 12]. 
However, Japanese tits may not assess the meanings of individ- 
ual call units in a sequence but rather perceive the whole 
sequence as an idiomatic expression with a unique message 
(e.g., “mobbing” instead of “alert + approach”; see [13-15)). 
Furthermore, it is also possible that they may respond more 
strongly to naturally combined sequences (ABC-D) simply 
because these sequences are more familiar than artificially 
reversed ones (D-ABC). If Japanese tits derive compound mean- 
ings from combinations of calls only when the meanings are 
given in a specific order, we predict that they should also derive 
compound meanings from novel combinations following the 
same ordering pattern (e.g., alert-recruitment ordering). 

Here, we explore whether Japanese tits use an ordering rule to 
decode novel call sequences. Japanese tits form mixed-species 
flocks with willow tits (Poecile montanus) during the non- 
breeding season. Willow tits produce “taa” calls to attract 
both conspecific and heterospecific flock members, including 
Japanese tits (Figure 1B; [8, 9]). Like D calls of Japanese tits, 
taa calls are used in a variety of contexts, such as when arriving 
at a new foraging patch or when mobbing a predator [16-18]. 
Thus, despite differences in their acoustic structure, both willow 
tits’ taa calls and Japanese tits’ D calls have a general recruit- 
ment function. We use this synonymy to artificially generate 
novel call sequences by combining conspecific alert calls 
(ABC) and heterospecific recruitment calls (taa) into different or- 
derings (ABC-taa and taa-ABC sequences; Figure 1C). If tits use 
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Figure 1. Experimental Setup 
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(A) Sound spectrograms of the natural alert-recruitment call sequence of Japanese tits given when mobbing a predator. 


(B) Sound spectrograms of separate recruitment and alert calls of willow tits. 


(C) Sound spectrograms of the novel call sequences composed of calls of Japanese and willow tits. Similarly to the natural call sequences of Japanese tits, calls 


from two species are artificially combined with an interval of 0.1 s. 


an alert-recruitment ordering rule to decode novel call se- 
quences, they are expected to respond to artificial ABC-tää se- 
quences similarly to synonymous ABC-D sequences but only 
weakly to reversed tää-ABC sequences. In contrast, if they do 
not use a call ordering rule when hearing the mixed call se- 
quences, they are expected to show similar responses to 
ABC-tää and tää-ABC sequences. 

We tested the response of free-living Japanese tits (n = 28) to 
the playback of ABC-tää or tää-ABC call sequences. Each stim- 
ulus contained 30 call sequences in 90 s, matching the structure 
of natural calling bouts in predator mobbing contexts [10]. After 
locating a flock of Japanese tits, call sequences were played 
back from a loudspeaker hung from a tree branch. Behavioral 
observations were made on a focal individual that was the 
closest individual to the loudspeaker among the flock members 
when each trial was started. All the playback trials were con- 
ducted at different locations separated by at least 400 m, 
ensuring that all the data were collected from different individ- 
uals [8] (see STAR Methods). 


Responses to Novel Sequences Depend on Call Ordering 

During the playback of ABC-taa sequences, Japanese tits 
scanned the surroundings by turning their heads from side to 
side (median = 13 head turns per 90 s), while at the same 
time approaching the loudspeaker with hops and flights to 
within 2 m (n = 12 out of 14 focal birds; Figure 2). Seven out 
of 14 focal individuals approached to within 1 m of the loud- 
speaker. In contrast, in response to taa-ABC sequences, tits 
made fewer head turn scans (median = 5 per 90 s; least-square 
mean: Z = 4.47, p < 0.001) and more rarely approached the 
loudspeaker within 2 m (n = 2 out of 14 focal birds; Z = 3.32, 


2332 Current Biology 27, 2331-2336, August 7, 2017 


p < 0.001; Figure 2). Also, only 1 out of 14 focal individuals ap- 
proached within 1 m. Thus, when hearing novel combinations 
of ABC and taa calls, Japanese tits exhibit stronger responses 
when the component calls follow the alert-recruitment ordering 
rule. 

The responses of Japanese tits to ABC-taa sequences were 
indistinguishable to those expressed in response to ABC-D se- 
quences (Figure 2; scans: Z = 0.20, p = 0.84; approach within 
2 m: Z = 1.29, p = 0.20; see also [7]), although we note that 
ABC-D playbacks were conducted in a previous year, not allow- 
ing us to account for potential year-to-year variation. Together, 
these results indicate that, when hearing novel ABC-taa se- 
quences, Japanese tits extract information equivalent to their 
own ABC-D sequences and that, like sequences containing 
ABC and D calls, call ordering determines responses. 


ABC-First Ordering Does Not Explain Responses 

Although Japanese tits discriminate between ABC-taa and taa- 
ABC sequences, it might be possible that they respond to any 
sequence beginning with ABC, irrespective of the call that fol- 
lows, because their native and therefore familiar mobbing call se- 
quences (ABC-D) typically begin with this call type. We assessed 
this possibility by combining alert calls of Japanese tits (ABC 
calls) with alert calls of willow tits (zi calls; Figure 1B; [18-20)), 
producing ABC-zi and zi-ABC sequences (Figure 1C). If ABC 
calls appearing first in a sequence are enough to elicit scanning 
and approach behavior, tits should respond more to ABC-zi than 
to zi-ABC sequences. In contrast, if ABC appearing first is not 
sufficient to explain responses to a given call sequence, tits 
are expected to respond similarly to ABC-zi and zi-ABC se- 
quences. We tested the responses of Japanese tits (n = 28) to 
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Figure 2. Response of Japanese Tits to Playbacks of Natural and 
Novel Call Sequences 

(A) Number of horizontal scans made by the focal individuals during 90 s. The 
box-and-whisker plots display the median and 25% and 75% quartiles; the 
whiskers are extended to the most extreme value inside the 1.5-fold inter- 
quartile range. 

(B) Percentage of trials in which the focal individuals approached within 2 m of 
the loudspeaker. 

(A and B) Natural call sequences (ABC-D) consist of their own calls only, 
whereas novel call sequences are composed of their own alert calls (ABC) and 
either recruitment (tää) or alert (zi) calls of willow tits. Data for ABC-D are 
derived from experiment 2 of our previous study [7]. Sample size: n = 17 in- 
dividuals for ABC-D; n = 14 individuals for each of ABC-tää, tää-ABC, ABC-zi, 
and zi-ABC. Each individual was exposed to only one treatment. 


the playbacks of these two sequences in the same manner as the 
former experiment (see STAR Methods). 

In response to both ABC-zi and zi-ABC sequences, Japanese 
tits exhibited some horizontal scans (ABC-zi: median = 4; zi- 
ABC: median = 5) but rarely approached the loudspeaker (within 
2 m: ABC-zi: n = 1 out of 14 focal birds; zi-ABC: n = 2 out of 14 
focal birds; Figure 2). They produced similar numbers of scans 
(Z = 0.47, p = 0.64) and were similarly likely to approach (Z = 
0.60, p = 0.55) in response to playbacks of either of these two 
sequences. Therefore, order-dependent responses to novel se- 
quences depend on the specific calls included rather than 
whether or not ABC calls appear first. 

Furthermore, our data allow us to reject the possibility that any 
modifier after ABC evokes mobbing-like responses in Japanese 
tits. Focal individuals exhibited fewer numbers of scanning 
and rarely approached the loudspeaker during the playback of 
ABC-zi sequences compared with the playback of ABC-tää se- 
quences (scans: Z = 2.52, p = 0.02; approach: Z = 3.38, p < 0.01) 
and that of ABC-D sequences (scans: Z = 2.82, p = 0.01; 
approach: Z = 2.75, p < 0.01). Thus, simply hearing ABC followed 
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Figure 3. Responses to D and Taa Calls by Japanese Tits Are Not 
Caused by Sound Similarity 

Proportion of flock members that approached within 2 m of the calls during 
3 min of playbacks. Japanese tits frequently approached within 2 m of the 
loudspeaker during playback of D calls and often during taa calls. However, 
they rarely approached in response to artificially shortened taa calls having an 
increased similarity to D calls (2 m: xê = 41.8, df = 2, p < 0.0001). The box-and- 
whisker plot displays the median and 25% and 75% quartiles; the whiskers are 
extended to the most extreme value inside the 1.5-fold interquartile range. 
Sample size: n = 14 flocks for each treatment. See also Figure S1. 


by a different call type is not sufficient to evoke either scanning or 
approach responses in Japanese tits. 


Acoustic Similarity Does Not Drive Responses 

Although D and taa calls differ in a number of acoustic features 
(Figure 1), there remains a possibility that some features cause 
Japanese tits to perceive taa calls as their own D calls and, 
consequently, perceive ABC-taa calls as ABC-D sequences 
(see [21]). To test this possibility, we compared the responses 
of Japanese tits to playbacks of D calls, taa calls, and artificially 
modified taa calls. The artificially modified taa calls were gener- 
ated by shortening the duration of taa calls to 50% of their natural 
length, making their duration similar to that of D calls (Figure S1A) 
and to the calls of other related species within the genus Poecile 
[22]. A spectrographic cross-correlation analysis confirmed that 
modified short-taa calls have greater similarity to D calls than 
original tää calls (Figure S1B). If tits perceive D and tää calls as 
the same call type because of their acoustic similarity, then 
they are expected to approach in response to short-taa calls in 
a similar way to D calls and original taa calls. In contrast, if Jap- 
anese tits categorize D and taa calls as distinct calls albeit with 
similar meanings, they are expected to approach D and taa calls, 
but not the unfamiliar short-taa calls. We tested the response 
of Japanese tits (n = 42 flocks) to each of the three call types 
(60 calls in 180 s). 

Japanese tits approached the loudspeaker during the play- 
backs of D calls and taa calls. A larger proportion of flock 
members approached within 2 m of the loudspeaker during the 
playback of D calls (median = 83% of flock members) than during 
the playback of taa calls (median = 13%) (Z = 3.49, p < 0.001). 
However, tits almost never approached the playback of short- 
taa calls (median = 0%; Figure 3) (short-taa versus D calls: Z = 
4.01, p < 0.001; short-taa versus taa calls: Z = 2.30, p = 0.02). 
These results indicate that Japanese tits categorize D and taa 
calls as different call types but extract a similar “approach” 
meaning from these calls. Therefore, equivalent responses to 
ABC-taa and ABC-D sequences are evoked by similarity in the 
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meanings of their component calls, but not by similarity in their 
acoustic features. 


Compositionality Drives Responses to Call Sequences 
Japanese tits respond to the playbacks of novel alert-recruit- 
ment call sequences (ABC-taa) and their own alert-recruitment 
sequences (ABC-D) by simultaneously scanning and approach- 
ing, behaviors that tits express independently when hearing alert 
(ABC) and recruitment (taa and D) calls, respectively (Figure 2; 
[7]). The fact that Japanese tits fail to respond to novel short- 
taa calls suggests that they extract information from willow tit 
taa calls only after learning to associate them with recruitment 
contexts [23] rather than recognizing any calls with similar 
acoustic structure innately [21]. Weak responses to ABC-zi 
sequences allow us to reject the possibility that responses to 
ABC-D and ABC-taa sequences are driven by a process where 
ABC calls prime tits to expect and, therefore, perceive any 
following call as a recruitment call (Figure 2). Moreover, given 
that tits perceive taa and D calls as different calls (Figure 3), 
equivalent responses to ABC-taa and ABC-D sequences are 
not caused by similarity in acoustic features of component calls. 
Instead, our results strongly support the idea that tits extract an 
equivalent, compound meaning from both their own (ABC-D) 
and novel (ABC-taa) call sequences by assessing and combining 
the meanings of individual call units (alert + approach), but not 
by perceiving the whole sequences as an idiomatic expression 
(e.g., unique mobbing message). 


Compositionality Depends on Call Ordering 

Japanese tits exhibit mixed alert and approach responses to 
novel alert-recruitment sequences (ABC-taa) but only weakly 
respond to the reversed recruitment-alert sequences (taa- 
ABC). This result demonstrates that, like when responding to 
their own sequences (ABC-D), Japanese tits use call ordering 
(i.e., alert recruitment) to decode novel sequences. These find- 
ings contrast with those from a previous study on monkeys. 
Although Diana monkeys (Cercopithecus diana) respond to 
heterospecific Campbell’s monkey (Cercopithecus campbelli) 
“boom” calls when combined with Campbell’s alarm calls, 
they ignore boom calls when artificially combined with conspe- 
cific alarm calls [24]. Diana monkeys do not produce boom calls 
or an equivalent call themselves [5, 24] and, thus, are unlikely to 
have an own ordering rule for the recognition of boom-related 
call combinations. As a result, Diana monkeys may not under- 
stand the semantic link between Campbell’s boom calls and 
their own alarm calls. In contrast, Japanese tits use an ordering 
rule when combining their own calls into sequences (ABC-D se- 
quences) [10] and extract similar information from willow tits’ taa 
calls and their own D calls, which may allow them to generalize 
their ordering rule to novel ABC-taa sequences. 

A critical aspect of compositionality in human language is 
that the meanings of word sequences (e.g., “John’s small, 
blue socks”) are not always generated from simply combining 
the meanings of the component words (e.g., “John,” “small,” 
“blue,” “socks”). In this case, grammatical rules, such as word 
order, help us understand the intended meaning of the overall 
expression; such modification of meanings is known as non-triv- 
ial compositionality [12]. In contrast, trivial compositionality cor- 
responds to combinations where the meaning of a sequence 
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(e.g., “small, blue”) is generated from a simple concatenation 
of the component words (“small” + “blue”) [12]. Our experiments 
show that Japanese tits produce compound responses to se- 
quences only when the calls are ordered into alert-recruitment 
sequences with a natural inter-call interval (i.e., 0.1 s) [7], sug- 
gesting that they might extract meaning in a non-trivial way. 
However, it is also possible that pragmatic rules (i.e., ordering 
of information units), but not grammatical rules, help tits discrim- 
inate between different orderings of call sequences [25, 26]. 
Future studies with artificially manipulated inter-call intervals 
could help to determine the cognitive mechanisms underlying 
the perception of call sequences. 


Evolutionary Implications and Future Direction 

It remains challenging to assess the evolutionary drivers that 
facilitate the evolution of call ordering rules. Clearly, both vocal 
signals and call ordering should be under strong selection to 
ensure that the message is transmitted effectively to receivers 
[11, 27]. In natural ABC-D call sequences of Japanese tits, the 
more urgent alert call is usually given before the less urgent 
social recruitment call. Urgency-first call sequences have 
been found in other birds [28, 29] and mammals ([30], but see 
[31]), allowing signalers to focus the attention of receivers on 
the predatory danger and, likewise, benefiting receivers by 
allowing them to modulate their behavior adaptively. In Japa- 
nese tits, alert calls unambiguously communicate a risky situa- 
tion and always appear at the beginning of sequences, while 
the subsequent recruitment calls are given in a range of different 
contexts that vary in their risk [7, 10]. Similarly, in call sequences 
of Campbell’s monkeys, calls with less specific information are 
produced after more informative calls, where suffixation of 
“oo” generalizes the meaning of predator-specific alarm calls 
[32, 33]. No matter the reason, once a specific call ordering 
becomes fixed, it is likely that receivers would express weaker 
responses to aberrant orderings. 

To conclude, our findings demonstrate that Japanese tits use 
an ordering rule to extract meanings from novel call sequences, 
representing a new analogy between animal communication 
systems and human language [3, 12]. Japanese tits and other 
members of the family Paridae are known to produce multiple 
note types and show a corresponding diversity of call combina- 
tions [22, 34]. The function of this vocal elaboration is still poorly 
understood, but the results presented here suggest that it may 
be underlain by call ordering rules that allow individuals to 
increase the number of messages that they can generate by 
combining their calls and associated meanings. Uncovering 
and comparing animal vocal sequences across species will likely 
reveal new aspects of the sophistication and complexity in ani- 
mal communication systems, which may shed light on the evolu- 
tion of compositionality and grammatical rules in language. 
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EXPERIMENTAL MODEL AND SUBJECT DETAILS 


Data were collected from wild Japanese tits (56 individuals for the first experiment, 42 flocks for the second experiment). Experiments 
were performed in accordance with relevant guidelines and regulations. All the experimental protocols were approved by SOKENDAI 
Ethics Committee for Animal Research and adhered to the Guidelines for the Use of Animals in Research of the Animal Behavior So- 
ciety/Association for the Study of Animal Behavior. This research was performed under permission from the Ministry of the Environ- 
ment and the Forestry Agency of Japan. 


METHOD DETAILS 


Responses to novel call sequences 

We designed experiments to test whether Japanese tits use an alert-recruitment call ordering rule to extract information from novel 
call sequences. If this is the case, tits are expected to show stronger responses to novel call sequences that follow the alert-recruit- 
ment ordering than to those do not. To test this hypothesis, we prepared two novel sequences by combining the alert call of Japanese 
tits (ABC) and the recruitment call of willow tits (tää) into different orderings; ABC-taa and taa-ABC sequences (Figure 1C). However, 
because Japanese tits always produce ABC calls before D calls when combining them, it is possible that familiarity with ABC-first 
sequences causes any differential response. To test this possibility, we prepared two additional sequences by combining the alert 
call of Japanese tits (ABC) and that of willow tits (zi) to produce ABC-zi and reversed zi-ABC sequences (Figure 1C). Both taa and zi 
calls of willow tits are often produced in mixed-species flocks [8, 9, 16], and thus, Japanese tits are familiar with both of these calls. 

The calls used for this study were recorded during previous field experiments by T.N.S. [10, 35]. ABC notes of Japanese tits were 
recorded by exposing either a stuffed jungle crow (Corvus macrorhynchos) or a stuffed Japanese marten (Martes melampus) near 
their nest boxes in May and June 2010. Zi and taa calls of willow tits were recorded by exposing a stuffed sparrowhawk (Accipiter 
nisus) near artificial feeders from December 2008 to January 2009. All the source individuals for the playback calls were individually 
identified either by unique combinations of color-rings or by their nest boxes. Calls were recorded with an LS370 parabolic micro- 
phone (Fuji Planning Corporation, Tokyo, Japan) connected to an R-O9HR digital audio recorder (sampling wave files at 48kHz 
and 16bits; Roland Corporation, Shizuoka, Japan) or to an MZ-RH1 Hi-MD walkman (sampling wave files at 44.1 kHz and 16bits; 
Sony Corporation, Tokyo, Japan) at the same field site as this study. 

We created playback stimuli by using Adobe Audition 3.0 software. We chose three types of notes (A, B, and C) from our recording 
library of Japanese tits (n = 14 individuals) and two types of notes (zi and taa) from the recording library of willow tits (n = 14 individuals 
for both) on the basis of the sound quality. A, B, and C notes were then combined into an ABC sequence with natural intervals be- 
tween the notes (60-150 ms, measured for each individual of the record source). Zi notes and tää notes were typically produced as a 
string of several notes (zi: 1-10 notes per call; taa: 1-20 notes per call). Thus, we chose calls with four zi notes and those with four taa 
notes as the acoustic modifiers of ABC. 

To generate playback stimuli, we combined ABC calls with either zi or tää calls into ABC-taa, taa-ABC, ABC-zi, and zi-ABC se- 
quences. Each call in a given sequence was separated at 0.1 s, which is within the range of intervals of ABC and D notes of Japanese 
tits [7]. Then, each call sequence was repeated in a sound file at a rate of 20 call sequences per minute (one sequence every 3 s, total 
duration 90 s). This calling rate corresponds to the range of the natural repetition rate of mobbing call sequences of Japanese tits [10]. 
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Within each playback stimulus, each call sequence was separated from any preceding sequences by background noise of either 
1.1 + 0.1 s (ABC-taa and taa-ABC sequences) or 1.7 + 0.1 s (ABC-zi and zi-ABC sequences) (mean + s.d., n = 14 for each). This dif- 
ference in intervals reflects the differences in the duration of tää and zi calls (tää: 1.2 + 0.1 s, zi: 0.7 + 0.0 s; n = 14 for each). Fourteen 
unique playback stimuli were created for each of the four treatments. We saved the sound files in WAV format (16-bit accuracy, 
44.1-kHz sampling rate) onto an SD memory card. 

We conducted playback experiments in mixed deciduous and coniferous forests near Karuizawa, Nagano Prefecture, Japan 
(36°17’-23’N, 138°28’-39’E). First, we located a flock of Japanese tits. Upon finding a flock, we hung an AT-SPG50 loudspeaker 
(Audio-Technica Corporation, Tokyo, Japan) from a tree branch at 1.8 + 0.1 m (mean + s.d., n = 56) from the ground. The loudspeaker 
was connected to an R-09 HR digital audio recorder (Roland Corporation, Shizuoka, Japan) with extension cords, which enabled the 
control of playbacks from an observation position ca. 10 m away from the loudspeaker. We chose the individual that was closest to 
the loudspeaker as the focal individual and recorded its behavior during the playback. We played back calls at a standardized volume 
(75 dB re 20 mPa at 1 m from the loudspeaker measured using an SM-325 sound level meter; AS ONE Corporation, Osaka, Japan), 
which is similar to the natural amplitude of calls given by Japanese tits and willow tits. 

To determine the tits’ responses to different treatments, we recorded the following behavioral variables during 90 s of playbacks: 
(1) number of horizontal scans: we counted the number of movements that focal birds made with their heads from left to right or right 
to left (approximately a 180° turn) and (2) approaching the loudspeaker: we recorded whether focal birds approached within 2 m of 
the loudspeaker during the playback. The observations were made with binoculars, and these variables were commented on an R-09 
HR digital audio recorder. For birds which flew to a tree with the loudspeaker, we measured the minimum distance from the bird to the 
loudspeaker after the trials. 

Trials were carried out under calm and dry weather conditions between 08:45 and 15:30 hr (Japan Standard Time), from 20 October 
to 2 November 2016, when Japanese tits often form mixed-species flocks with willow tits. Treatments with the same call composition 
(e.g., ABC-taa and taa-ABC) were alternated in successive trials, and two-trial blocks with different call composition (i.e., [ABC-taa 
and taa-ABC] or [ABC-zi and zi-ABC]) were played back in turns, so that responses to all four treatments were observed under largely 
similar conditions. In two trials, a black kite (Milvus migrans) or a jungle crow flew close to the focal bird, and tits were apparently 
alerted to these potential threats. In these cases, we stopped the trials and repeated the same treatment in the next place. The min- 
imum distance between experimental sites was 400 m, because previous observations of color-banded individuals showed that this 
distance is enough to ensure that independent data are collected from different individuals [8]. 

We collected data from 42 males and 14 females across all four treatments (ABC-taa: 11 males and 3 females; taa-ABC: 12 males 
and 2 females; ABC-zi: 11 males and 3 females; zi-ABC: 8 males and 6 females). The sex of the birds was determined from their sexu- 
ally dimorphic plumage patterns (male Japanese tits have a broader black stripe from the breast to the vent than females). Each focal 
individual was exposed to a unique exemplar of each call sequence to avoid pseudo-replication [36]. 


Comparisons between ABC-first sequences 

We investigated whether tits derive an equivalent information (“approach with alertness”) from ABC-taa sequences and synonymous 
ABC-D sequences, but not from ABC-zi sequences. In 2015, we conducted playback experiments with ABC-D sequences in the 
same manner as the present experiment: we created a total of n = 17 unique exemplars of ABC-D sequences that contain 30 calls 
in 90 s files (one call every 3 s), playing them to Japanese tits in the wild (n = 17; 11 males and 6 females) [7]. Using the data from this 
previous experiment, we compared the responses of tits to three sequences; ABC-D, ABC-taa, and ABC-zi sequences. We note that 
we could not control for the possible effect of the study year. 


Acoustic similarity between D and taa calls 

The present experiment is based on the assumption that Japanese tits perceive their own D calls and heterospecific taa calls as 
different call types, but extract similar information from them. However, it is still possible that tits may simply respond to the acoustic 
features shared between these two calls (see [21]). To assess this possibility, we carried out a playback experiment using D calls, taa 
calls, and novel type of “taa” calls having an increased similarity to D calls. Using the software program Audacity 2.1.0 (http://www. 
audacityteam.org), we artificially shortened the duration of taa calls (1.23 + 0.08 s; mean + s.d., n= 14) to be 50% (0.62 + 0.04s, n= 
14), producing “short-taa” calls (Figure S1A). These short-taa calls have a similar duration to that of D calls (0.67 + 0.09 s, n = 14), 
while keeping the same frequency-related measures of the original taa calls. If tits respond to D and taa calls simply because of their 
acoustic similarity, tits are expected to approach to the novel short-taa calls similarly or more strongly than to taa calls. In contrast, if 
some other mechanism, such as learning [23], explains responses to taa calls, tits should have reduced responses to novel, unfamiliar 
short-taa calls compared to original taa calls. We obtained taa calls (n = 14) comprised of four taa notes from different individual wil- 
low tits and edited them to obtain short-taa calls (n = 14). We also prepared D calls comprised of 7-10 D notes recorded from different 
individual Japanese tits (n = 14). Each of the sound files contains each type of call repeated at a rate of 20 calls per minute (one call 
every 3 s, total duration 180 s). 

To confirm whether we successfully increased the similarity of taa calls by simply shortening the duration, we carried out a spec- 
trographic cross-correlation (SPCC) analysis using the batch correlation function in Raven Pro 1.5.0 [37]. SPCC calculates sound 
similarity by comparing two sound spectrograms at all possible time points to produce a correlation coefficient [37]. All the calls 
were bandpass filtered at 1000-16000 Hz to reduce the possible influence of background noise in analyses. Spectrograms were input 
using a 3 dB filter bandwidth of 248 Hz, with a Hann window function (60% overlap of time grid and 256 samples of frequency grid). 
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Then, we obtained two matrices of correlation coefficients from the comparisons between D calls and taa calls and between D calls 
and short-taa calls. In these matrices, each value varies between 0 to 1; greater values represent larger similarity to D calls. These 
matrices were then compared by using a Mantel test with 5000 permutations (Figure S1B). This statistical test was conducted using R 
for Mac OS X version 3.1.2 [38] and its function (mantel in the R package vegan; [39]). 

We conducted this playback experiment in mixed deciduous and coniferous forests near Karuizawa, Nagano Prefecture, Japan 
(36°18’-22’N, 138°31’-39’E). Similarly to the first experiment, we hung the loudspeaker from a tree branch (height: 1.7 + 0.1 m, 
mean + s.d., n = 42) at a distance of ca. 20 m from a flock of Japanese tits. The loudspeaker was connected to an R-09 HR digital 
audio recorder (Roland Corporation, Shizuoka, Japan) with extension cords, which enabled the control of playbacks from an obser- 
vation position ca. 10 m away from the loudspeaker. We counted the number of flock members and then played back calls at a 
standardized volume (75 dB re 20 mPa at 1 m from the loudspeaker). During the 3 min of call playback, we recorded the number 
of Japanese tits approached within 2 m of the loudspeaker. The observations were made with binoculars, and these variables 
were commented on an R-09 HR digital audio recorder. 

Trials were carried out under calm and dry weather conditions between 08:45 and 15:30 hr (Japan Standard Time), from 25 March 
to 30 March 2017 when Japanese tits form mixed-species flocks with willow tits. The order of treatments was counter-balanced, so 
that responses to all three treatments were observed under largely similar conditions. In two trials, a jungle crow and a sparrowhawk 
flew close to the focal flock and tits were apparently alerted to them. In these cases, we stopped the trials and repeated the same 
treatment in the next place. The minimum distance between experimental sites was 400 m, to ensure the collection of independent 
data from different individual tits [8]. We collected data from 42 flocks across all three treatments (14 flocks per each of the three 
treatments). Each focal flock was exposed to a unique exemplar of each call to avoid pseudo-replication [36]. 


QUANTIFICATION AND STATISTICAL ANALYSIS 


We analyzed the data using generalized linear models in R [38]. We used a negative binomial error distribution and log-link function 
(gim.nb in the R package MASS; [40]) for models analyzing the number of horizontal scans and a binomial error distribution and logit- 
link function (glm in the R package stats) for models analyzing the probability of approach behavior (yes or no). 

First, we analyzed whether tits respond differently to the four types of playbacks (ABC- taa, taa-ABC, ABC-zi, zi-ABC). We fitted call 
ordering (ABC first or last), call composition (ABC + taa or ABC + zi), and their interaction term, and sex of the focal bird as fixed terms. 
In 12 out of 56 trials, focal individuals flew into a dense canopy during the playbacks. Thus, we determined the time duration in which 
we could record on the behavior of the focal individual as the observation time and included this term in the analysis of horizontal 
scans as a log-transformed offset. In these analyses, we found a significant interaction between ordering and composition (Scans: 
x? = 7.64, df = 1, p < 0.01; approach: x? = 7.34, df = 1, p < 0.01). However, we did not find any influence of sex of the focal birds on 
either behavioral response (scans: xê = 0.13, df = 1, p = 0.72; approach: x? = 1.05, df = 1, p = 0.31). Therefore, we further conducted 
post hoc pairwise comparisons of responses between ABC-taa and taa-ABC calls, and ABC-zi and zi-ABC calls by using least- 
square means (/smeans in the R package /smeans; [41]), where sex was excluded from the models. We used likelihood ratio tests 
to calculate the significance of each fixed term. 

For comparisons between ABC-taa, ABC-zi, and ABC-D sequences, we fitted playback treatments and sex as fixed terms. In 17 
out of 45 trials, tits flew out of sight during the observation. Thus, we included the time duration in which we could record the behavior 
of the focal individual and included this term in the analysis of horizontal scans as a log-transformed offset. In the primary analyses, 
call treatment had a significant effect on both responses (scans: x? = 7.84, df = 2, p = 0.02; speaker approach: x? = 21.42, df=2, p< 
0.0001), whereas sex had no significant influences on either response (scans: x? = 2.17, df= 1, p = 0.14; speaker approach: x? = 0.08, 
of =1,p=0.77). Therefore, we further compared the responses of tits between the three call treatments by using least-square means 
(Ismeans in the R package /smeans; [41]), where sex was excluded from the models. 

Data from playbacks of D, taa, and short-taa calls were analyzed using generalized linear models (g/m in the R package stats) with a 
binomial error structure and logit link function for primary analysis. Responses of tits to the three call treatments were then compared 
by using least-square means (/smeans in the R package /smeans; [41]). We fitted the proportion of flock members approached within 
2 m of the loudspeaker as a dependent variable and call treatment as a fixed term. 

We used log likelihood ratio tests for calculating p values. When making multiple comparisons, a false discovery rate control [42] 
was used to adjust p values. All tests were two-tailed and the significance level was set at a = 0.05. 


DATA AND SOFTWARE AVAILABILITY 
All software required to perform the analyses described in the Quantification and Statistical Analysis section is freely available 


to download for the open source R program. Raw data have been deposited in Mendeley Data at http://dx.doi.org/10.17632/ 
r7v96zf5pp.1. 
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